ARCS-Motif: discovering correlated motifs from unaligned biological sequences
نویسندگان
چکیده
MOTIVATION The goal of motif discovery is to detect novel, unknown, and important signals from biology sequences. In most models, the importance of a motif is equal to the sum of the similarity of every single position. In 2006, Song et al. introduced Aggregated Related Column Score (ARCS) measure which includes correlation information to the evaluation of motif importance. The paper showed that the ARCS measure is superior to other measures. Due to the complicated nature of the ARCS motif model, we cannot directly apply existing sequential motif discovery methods to find motifs with high ARCS values. RESULTS This article presents a novel mining algorithm, ARCS-Motif, to discover related sequential motifs in biological sequences. ARCS-Motif is applied to 400 PROSITE datasets and compared with five alternative methods (CONSENSUS, Gibbs sampler, MEME, SPLASH and DIALIGN-TX). ARCS-Motif outperforms all the methods in accuracy, and most of the methods in efficiency. Although SPLASH has better efficiency than ARCS-Motif, ARCS-Motif has much better accuracy than SPLASH. On average, ARCS-Motif is able to produce the motifs which are at least 10% better than the best of the alternative methods. Among the 400 PROSITE datasets, ARCS-Motif produces the best motifs for more than 200 families. Other than SPLASH, the execution time of ARCS-Motif is less than a third of that of the fastest alternative method and its execution time grows at the slowest rate with respect to the number of sequences and the average sequence among all methods.
منابع مشابه
Genetic Algorithm Based Probabilistic Motif Discovery in Multiple Unaligned Biological Sequences
Many computational approaches have been introduced for the problem of motif identification in a set of biological sequences, which are classified according to the type of motifs discovered. In this study, we propose a model to discover motif in large set of unaligned sequences in considerably minimum time using genetic algorithm based probabilokistic Motif discovery model. The proposed algorith...
متن کاملFitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer
The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find...
متن کاملA Combinatorial Approach for Motif Discovery in Unaligned DNA Sequences
Motif (conserved pattern) modelling and finding in unaligned DNA sequences is a fundamental problem in computational biology with important applications in understanding gene regulation. Biological approaches for this problem are tedious and time-consuming. Large amounts of genome sequence data and gene expression micro-array data let us solve this problem computationally. Most computer science...
متن کاملFitting a mixture model by expectation maximization to discover motifs in biopolymers
The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein se quences by using the technique of expectation maxi mization to t a two component nite mixture model to the set of sequences Multiple motifs are found by tting a mixture model to the data probabilistically erasing the occurrences of the motif thus found and repeating the process to nd successi...
متن کاملDiscovering common stem-loop motifs in unaligned RNA sequences.
Post-transcriptional regulation of gene expression is often accomplished by proteins binding to specific sequence motifs in mRNA molecules, to affect their translation or stability. The motifs are often composed of a combination of sequence and structural constraints such that the overall structure is preserved even though much of the primary sequence is variable. While several methods exist to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 25 2 شماره
صفحات -
تاریخ انتشار 2009